AITopics | deterministic function

Collaborating Authors

deterministic function

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Distributional View of High Dimensional Optimization

Benning, Felix

arXiv.org Machine LearningJul-23-2025

This PhD thesis presents a distributional view of optimization in place of a worst-case perspective. We motivate this view with an investigation of the failure point of classical optimization. Subsequently we consider the optimization of a randomly drawn objective function. This is the setting of Bayesian Optimization. After a review of Bayesian optimization we outline how such a distributional view may explain predictable progress of optimization in high dimension. It further turns out that this distributional view provides insights into optimal step size control of gradient descent. To enable these results, we develop mathematical tools to deal with random input to random functions and a characterization of non-stationary isotropic covariance kernels. Finally, we outline how assumptions about the data, specifically exchangability, can lead to random objective functions in machine learning and analyze their landscape.

artificial intelligence, high-dimensional non-convex optimization, machine learning, (20 more...)

arXiv.org Machine Learning

2507.16315

Genre:

Workflow (0.92)
Research Report > New Finding (0.45)
Instructional Material > Course Syllabus & Notes (0.45)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(3 more...)

Add feedback

GenAI-Powered Inference

Imai, Kosuke, Nakamura, Kentaro

arXiv.org Machine LearningJul-8-2025

We introduce GenAI-Powered Inference (GPI), a statistical framework for both causal and predictive inference using unstructured data, including text and images. GPI leverages open-source Generative Artificial Intelligence (GenAI) models - such as large language models and diffusion models - not only to generate unstructured data at scale but also to extract low-dimensional representations that capture their underlying structure. Applying machine learning to these representations, GPI enables estimation of causal and predictive effects while quantifying associated estimation uncertainty. Unlike existing approaches to representation learning, GPI does not require fine-tuning of generative models, making it computationally efficient and broadly accessible. We illustrate the versatility of the GPI framework through three applications: (1) analyzing Chinese social media censorship, (2) estimating predictive effects of candidates' facial appearance on electoral outcomes, and (3) assessing the persuasiveness of political rhetoric. An open-source software package is available for implementing GPI.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2507.03897

Country:

Asia > China (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Industry:

Transportation (0.68)
Law > Civil Rights & Constitutional Law (0.50)
Government > Voting & Elections (0.46)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.49)

Add feedback

Optimal Depth of Neural Networks

Qi, Qian

arXiv.org Artificial IntelligenceJun-23-2025

Determining the optimal depth of a neural network is a fundamental yet challenging problem, typically resolved through resource-intensive experimentation. This paper introduces a formal theoretical framework to address this question by recasting the forward pass of a deep network, specifically a Residual Network (ResNet), as an optimal stopping problem. We model the layer-by-layer evolution of hidden representations as a sequential decision process where, at each layer, a choice is made between halting computation to make a prediction or continuing to a deeper layer for a potentially more refined representation. This formulation captures the intrinsic trade-off between accuracy and computational cost. Our primary theoretical contribution is a proof that, under a plausible condition of diminishing returns on the residual functions, the expected optimal stopping depth is provably finite, even in an infinite-horizon setting. We leverage this insight to propose a novel and practical regularization term, $\mathcal{L}_{\rm depth}$, that encourages the network to learn representations amenable to efficient, early exiting. We demonstrate the generality of our framework by extending it to the Transformer architecture and exploring its connection to continuous-depth models via free-boundary problems. Empirical validation on ImageNet confirms that our regularizer successfully induces the theoretically predicted behavior, leading to significant gains in computational efficiency without compromising, and in some cases improving, final model accuracy.

artificial intelligence, machine learning, optimal depth, (18 more...)

arXiv.org Artificial Intelligence

2506.16862

Country:

Europe > Switzerland > Basel-City > Basel (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Factored space models: Towards causality between levels of abstraction

Garrabrant, Scott, Mayer, Matthias Georg, Wache, Magdalena, Lang, Leon, Eisenstat, Sam, Dell, Holger

arXiv.org Artificial IntelligenceDec-20-2024

Causality plays an important role in understanding intelligent behavior, and there is a wealth of literature on mathematical models for causality, most of which is focused on causal graphs. Causal graphs are a powerful tool for a wide range of applications, in particular when the relevant variables are known and at the same level of abstraction. However, the given variables can also be unstructured data, like pixels of an image. Meanwhile, the causal variables, such as the positions of objects in the image, can be arbitrary deterministic functions of the given variables. Moreover, the causal variables may form a hierarchy of abstractions, in which the macro-level variables are deterministic functions of the micro-level variables. Causal graphs are limited when it comes to modeling this kind of situation. In the presence of deterministic relationships there is generally no causal graph that satisfies both the Markov condition and the faithfulness condition. We introduce factored space models as an alternative to causal graphs which naturally represent both probabilistic and deterministic relationships at all levels of abstraction. Moreover, we introduce structural independence and establish that it is equivalent to statistical independence in every distribution that factorizes over the factored space. This theorem generalizes the classical soundness and completeness theorem for d-separation.

artificial intelligence, independence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2412.02579

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
Europe > United Kingdom > England > East Sussex > Brighton (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(4 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Universal Approximation Property of Random Neural Networks

Neufeld, Ariel, Schmocker, Philipp

arXiv.org Machine LearningDec-20-2023

In this paper, we study random neural networks which are single-hidden-layer feedforward neural networks whose weights and biases are randomly initialized. After this random initialization, only the linear readout needs to be trained, which can be performed efficiently, e.g., by the least squares method. By viewing random neural networks as Banach space-valued random variables, we prove a universal approximation theorem within a large class of Bochner spaces. Hereby, the corresponding Banach space can be significantly more general than the space of continuous functions over a compact subset of a Euclidean space, namely, e.g., an $L^p$-space or a Sobolev space, where the latter includes the approximation of the derivatives. Moreover, we derive approximation rates and an explicit algorithm to learn a deterministic function by a random neural network. In addition, we provide a full error analysis and study when random neural networks overcome the curse of dimensionality in the sense that the training costs scale at most polynomially in the input and output dimension. Furthermore, we show in two numerical examples the empirical advantages of random neural networks compared to fully trained deterministic neural networks.

artificial intelligence, machine learning, neural network, (17 more...)

arXiv.org Machine Learning

2312.0841

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.27)
North America > United States > Rhode Island > Providence County > Providence (0.13)
North America > United States > California > San Diego County > San Diego (0.04)
(2 more...)

Genre: Research Report (0.40)

Industry: Banking & Finance (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness

Neural Information Processing SystemsApr-6-2023, 12:58:12 GMT

We consider a global optimization problem of a deterministic function f in a semimetric space, given a finite budget of n evaluations. The function f is assumed to be locally smooth (around one of its global maxima) with respect to a semi-metric . We describe two algorithms based on optimistic exploration that use a hierarchical partitioning of the space at all scales. A first contribution is an algorithm, DOO, that requires the knowledge of . We report a finite-sample performance bound in terms of a measure of the quantity of near-optimal states.

deterministic function, optimistic optimization, smoothness, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

SAFFRON and LORD Ensure Online Control of the False Discovery Rate Under Positive Dependence

Fisher, Aaron

arXiv.org Machine LearningOct-27-2021

Online testing procedures assume that hypotheses are observed in sequence, and allow the significance thresholds for upcoming tests to depend on the test statistics observed so far. Some of the most popular online methods include alpha investing, LORD++ (hereafter, LORD), and SAFFRON. These three methods have been shown to provide online control of the "modified" false discovery rate (mFDR). However, to our knowledge, they have only been shown to control the traditional false discovery rate (FDR) under an independence condition on the test statistics. Our work bolsters these results by showing that SAFFRON and LORD additionally ensure online control of the FDR under nonnegative dependence. Because alpha investing can be recovered as a special case of the SAFFRON framework, the same result applies to this method as well. Our result also allows for certain forms of adaptive stopping times, for example, stopping after a certain number of rejections have been observed. For convenience, we also provide simplified versions of the LORD and SAFFRON algorithms based on geometric alpha allocations.

artificial intelligence, machine learning, threshold, (17 more...)

arXiv.org Machine Learning

2110.08161

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Comment on "Blessings of Multiple Causes"

Ogburn, Elizabeth L., Shpitser, Ilya, Tchetgen, Eric J. Tchetgen

arXiv.org Machine LearningOct-17-2019

This scenario is dir ectly analogous to longitudinal causal inference problems with multiple time-varying treatments that conta in time-varying confounders, variables that serve as confounders for some treatments and as mediators for othe r treatments. If there is an unmeasured con-founder for the R -Y relationship (represented by V and the dashed arrows in Figure 1 (a)), then conditioning on R fails to identify the direct effects of A on Y, because it opens a confounding pathway through V . See Hernan and Robins (2020) for an overview of these issues. The answer to the question posed in Appendix B of WB, "Can the c auses be causally dependent among themselves?" is therefore "no." If they are causally depend ent then the deconfounder, by dint of rendering the causes independent, breaks some of the structure among t he causes A, and as was originally established in the time-varying treatment setting, this undermines the identification of joint effects of A on Y by covariate adjustment. Analysis of Lemma 4. This simple argument also serves as a counterexample to Lemm a 4, which states that the deconfounder does not pick up any post-treatment va riables and can be treated as a pre-treatment covariate. This is necessarily false whenever the causes ar e causally dependent among themselves, but it need not hold even if the causes are not causally dependent, s ee below. The proof of Lemma 4 in Appendix I states that "Inferring the s ubstitute confounder Z

artificial intelligence, assumption, machine learning, (18 more...)

arXiv.org Machine Learning

1910.05438

Country:

North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
North America > United States > California (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.47)

Add feedback

Credit Assignment Techniques in Stochastic Computation Graphs

Weber, Théophane, Heess, Nicolas, Buesing, Lars, Silver, David

arXiv.org Machine LearningJan-7-2019

Stochastic computation graphs (SCGs) provide a formalism to represent structured optimization problems arising in artificial intelligence, including supervised, unsupervised, and reinforcement learning. Previous work has shown that an unbiased estimator of the gradient of the expected loss of SCGs can be derived from a single principle. However, this estimator often has high variance and requires a full model evaluation per data point, making this algorithm costly in large graphs. In this work, we address these problems by generalizing concepts from the reinforcement learning literature. We introduce the concepts of value functions, baselines and critics for arbitrary SCGs, and show how to use them to derive lower-variance gradient estimates from partial model evaluations, paving the way towards general and efficient credit assignment for gradient-based optimization. In doing so, we demonstrate how our results unify recent advances in the probabilistic inference and reinforcement learning literature.

estimator, gradient, value function, (15 more...)

arXiv.org Machine Learning

1901.01761

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report (0.84)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
(3 more...)

Add feedback

Pathologies in information bottleneck for deterministic supervised learning

Kolchinsky, Artemy, Tracey, Brendan D., Van Kuyk, Steven

arXiv.org Machine LearningAug-22-2018

Information bottleneck (IB) is a method for extracting information from one random variable X that is relevant for predicting another random variable Y . To do so, IB identifies an intermediate "bottleneck" variable T that has low mutual information I(X; T) and high mutual information I(Y; T). The IB curve characterizes the set of bottleneck variables that achieve maximal I(Y; T) for a given I(X; T), and is typically explored by optimizing the IB Lagrangian, I(Y; T) βI(X; T). Recently, there has been interest in applying IB to supervised learning, particularly for classification problems that use neural networks. In most classification problems, the output class Y is a deterministic function of the input X, which we refer to as "deterministic supervised learning". We demonstrate three pathologies that arise when IB is used in any scenario where Y is a deterministic function of X: (1) the IB curve cannot be recovered by optimizing the IB Lagrangian for different values of β; (2) there are "uninteresting" solutions at all points of the IB curve; and (3) for classifiers that achieve low error rates, the activity of different hidden layers will not exhibit a strict tradeoff between compression and prediction, contrary to a recent proposal. To address problem (1), we propose a functional that, unlike the IB Lagrangian, can recover the IB curve in all cases. We finish by demonstrating these issues on the MNIST dataset. The information bottleneck (IB) method (Tishby et al., 1999) provides a principled way to extract information that is present in one variable that is relevant for predicting another variable. Given two random variables X and Y, IB posits a "bottleneck" variable T that obeys the Markov condition Y X T . By the data processing inequality (DPI) (Cover & Thomas, 2012), this Markov condition implies that I(X; T) I(Y; T), meaning the bottleneck variable cannot contain more information about Y than it does about X.

artificial intelligence, ib curve, machine learning, (19 more...)

arXiv.org Machine Learning

1808.07593

Country:

Oceania > New Zealand > North Island > Wellington Region > Wellington (0.04)
North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Diagnostic Medicine (0.61)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback